38 research outputs found

    Topological network alignment uncovers biological function and phylogeny

    Full text link
    Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology, and disease. Comparison and alignment of biological networks will likely have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein-protein interaction networks of two very different species--yeast and human--indicate that even distant species share a surprising amount of network topology with each other, suggesting broad similarities in internal cellular wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde

    Probabilistic Random Walk Models for Comparative Network Analysis

    Get PDF
    Graph-based systems and data analysis methods have become critical tools in many fields as they can provide an intuitive way of representing and analyzing interactions between variables. Due to the advances in measurement techniques, a massive amount of labeled data that can be represented as nodes on a graph (or network) have been archived in databases. Additionally, novel data without label information have been gradually generated and archived. Labeling and identifying characteristics of novel data is an important first step in utilizing the valuable data in an effective and meaningful way. Comparative network analysis is an effective computational means to identify and predict the properties of the unlabeled data by comparing the similarities and differences between well-studied and less-studied networks. Comparative network analysis aims to identify the matching nodes and conserved subnetworks across multiple networks to enable a prediction of the properties of the nodes in the less-studied networks based on the properties of the matching nodes in the well-studied networks (i.e., transferring knowledge between networks). One of the fundamental and important questions in comparative network analysis is how to accurately estimate node-to-node correspondence as it can be a critical clue in analyzing the similarities and differences between networks. Node correspondence is a comprehensive similarity that integrates various types of similarity measurements in a balanced manner. However, there are several challenges in accurately estimating the node correspondence for large-scale networks. First, the scale of the networks is a critical issue. As networks generally include a large number of nodes, we have to examine an extremely large space and it can pose a computational challenge due to the combinatorial nature of the problem. Furthermore, although there are matching nodes and conserved subnetworks in different networks, structural variations such as node insertions and deletions make it difficult to integrate a topological similarity. In this dissertation, novel probabilistic random walk models are proposed to accurately estimate node-to-node correspondence between networks. First, we propose a context-sensitive random walk (CSRW) model. In the CSRW model, the random walker analyzes the context of the current position of the random walker and it can switch the random movement to either a simultaneous walk on both networks or an individual walk on one of the networks. The context-sensitive nature of the random walker enables the method to effectively integrate different types of similarities by dealing with structural variations. Second, we propose the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) model. In the CUFID model, we construct an integrated network by inserting pseudo edges between potential matching nodes in different networks. Then, we design the random walk protocol to transit more frequently between potential matching nodes as their node similarity increases and they have more matching neighboring nodes. We apply the proposed random walk models to comparative network analysis problems: global network alignment and network querying. Through extensive performance evaluations, we demonstrate that the proposed random walk models can accurately estimate node correspondence and these can lead to improved and reliable network comparison results

    Simultaneous Optimization of Both Node and Edge Conservation in Network Alignment via WAVE

    Full text link
    Network alignment can be used to transfer functional knowledge between conserved regions of different networks. Typically, existing methods use a node cost function (NCF) to compute similarity between nodes in different networks and an alignment strategy (AS) to find high-scoring alignments with respect to the total NCF over all aligned nodes (or node conservation). But, they then evaluate quality of their alignments via some other measure that is different than the node conservation measure used to guide the alignment construction process. Typically, one measures the amount of conserved edges, but only after alignments are produced. Hence, a recent attempt aimed to directly maximize the amount of conserved edges while constructing alignments, which improved alignment accuracy. Here, we aim to directly maximize both node and edge conservation during alignment construction to further improve alignment accuracy. For this, we design a novel measure of edge conservation that (unlike existing measures that treat each conserved edge the same) weighs each conserved edge so that edges with highly NCF-similar end nodes are favored. As a result, we introduce a novel AS, Weighted Alignment VotEr (WAVE), which can optimize any measures of node and edge conservation, and which can be used with any NCF or combination of multiple NCFs. Using WAVE on top of established state-of-the-art NCFs leads to superior alignments compared to the existing methods that optimize only node conservation or only edge conservation or that treat each conserved edge the same. And while we evaluate WAVE in the computational biology domain, it is easily applicable in any domain.Comment: 12 pages, 4 figure

    PROPER: global protein interaction network alignment through percolation matching

    Get PDF
    Background The alignment of protein-protein interaction (PPI) networks enables us to uncover the relationships between different species, which leads to a deeper understanding of biological systems. Network alignment can be used to transfer biological knowledge between species. Although different PI-network alignment algorithms were introduced during the last decade, developing an accurate and scalable algorithm that can find alignments with high biological and structural similarities among PPI networks is still challenging. Results In this paper, we introduce a new global network alignment algorithm for PPI networks called PROPER. Compared to other global network alignment methods, our algorithm shows higher accuracy and speed over real PPI datasets and synthetic networks. We show that the PROPER algorithm can detect large portions of conserved biological pathways between species. Also, using a simple parsimonious evolutionary model, we explain why PROPER performs well based on several different comparison criteria. Conclusions We highlight that PROPER has high potential in further applications such as detecting biological pathways, finding protein complexes and PPI prediction. The PROPER algorithm is available at http://proper.epfl.ch

    Mining host-pathogen protein interactions to characterize Burkholderia mallei infectivity mechanisms.

    No full text
    Burkholderia pathogenicity relies on protein virulence factors to control and promote bacterial internalization, survival, and replication within eukaryotic host cells. We recently used yeast two-hybrid (Y2H) screening to identify a small set of novel Burkholderia proteins that were shown to attenuate disease progression in an aerosol infection animal model using the virulent Burkholderia mallei ATCC 23344 strain. Here, we performed an extended analysis of primarily nine B. mallei virulence factors and their interactions with human proteins to map out how the bacteria can influence and alter host processes and pathways. Specifically, we employed topological analyses to assess the connectivity patterns of targeted host proteins, identify modules of pathogen-interacting host proteins linked to processes promoting infectivity, and evaluate the effect of crosstalk among the identified host protein modules. Overall, our analysis showed that the targeted host proteins generally had a large number of interacting partners and interacted with other host proteins that were also targeted by B. mallei proteins. We also introduced a novel Host-Pathogen Interaction Alignment (HPIA) algorithm and used it to explore similarities between host-pathogen interactions of B. mallei, Yersinia pestis, and Salmonella enterica. We inferred putative roles of B. mallei proteins based on the roles of their aligned Y. pestis and S. enterica partners and showed that up to 73% of the predicted roles matched existing annotations. A key insight into Burkholderia pathogenicity derived from these analyses of Y2H host-pathogen interactions is the identification of eukaryotic-specific targeted cellular mechanisms, including the ubiquitination degradation system and the use of the focal adhesion pathway as a fulcrum for transmitting mechanical forces and regulatory signals. This provides the mechanisms to modulate and adapt the host-cell environment for the successful establishment of host infections and intracellular spread

    Host pathways targeted by <i>Coxiella</i>.

    No full text
    <p><i>C</i>. <i>burnetii</i>-interacting host proteins are present in interconnected Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways with the potential to affect multiple cellular processes of the host. The pathways are grouped into five major categories: RNA processing, protein processing, degradation pathways, signaling (including signaling events related to the immune response), and metabolism. The size of a star indicates the number of targeted host proteins in each pathway. ECM, extracellular matrix; ER, endoplasmic reticulum; ErbB, erythroblastic leukemia viral oncogene; ESCRT, endosomal sorting complexes required for transport; MAPK, mitogen-activated protein kinase; NOD, nucleotide-binding oligomerization domain; PIK3, phosphatidylinositol-3-kinases; TCA, tricarboxylic acid; TGF, transforming growth factor.</p

    Mechanisms of action of <i>Coxiella burnetii</i> effectors inferred from host-pathogen protein interactions

    No full text
    <div><p><i>Coxiella burnetii</i> is an obligate Gram-negative intracellular pathogen and the etiological agent of Q fever. Successful infection requires a functional Type IV secretion system, which translocates more than 100 effector proteins into the host cytosol to establish the infection, restructure the intracellular host environment, and create a parasitophorous vacuole where the replicating bacteria reside. We used yeast two-hybrid (Y2H) screening of 33 selected <i>C</i>. <i>burnetii</i> effectors against whole genome human and murine proteome libraries to generate a map of potential host-pathogen protein-protein interactions (PPIs). We detected 273 unique interactions between 20 pathogen and 247 human proteins, and 157 between 17 pathogen and 137 murine proteins. We used orthology to combine the data and create a single host-pathogen interaction network containing 415 unique interactions between 25 <i>C</i>. <i>burnetii</i> and 363 human proteins. We further performed complementary pairwise Y2H testing of 43 out of 91 <i>C</i>. <i>burnetii-</i>human interactions involving five pathogen proteins. We used the combined data to <i>1</i>) perform enrichment analyses of target host cellular processes and pathways, <i>2</i>) examine effectors with known infection phenotypes, and <i>3</i>) infer potential mechanisms of action for four effectors with uncharacterized functions. The host-pathogen interaction profiles supported known <i>Coxiella</i> phenotypes, such as adapting cell morphology through cytoskeletal re-arrangements, protein processing and trafficking, organelle generation, cholesterol processing, innate immune modulation, and interactions with the ubiquitin and proteasome pathways. The generated dataset of PPIs—the largest collection of unbiased <i>Coxiella</i> host-pathogen interactions to date—represents a rich source of information with respect to secreted pathogen effector proteins and their interactions with human host proteins.</p></div

    Topological properties of human proteins interacting with <i>B. mallei</i>.

    No full text
    <p>SD: standard deviation.</p><p>We evaluated the following properties of the host proteins that interacted with B. mallei proteins based on the human protein-protein interaction (PPI) network [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004088#pcbi.1004088.ref025" target="_blank">25</a>]: the number of these host proteins in the human PPI network (<i>N<sub>p</sub></i>); the average number of interacting partners (in the human PPI network) of each host protein (<i>D</i>); the clustering coefficient, i.e., the number of interactions among the nearest neighbors (<i>C</i>); the average shortest path between any two proteins in the set (<i>SP</i>); the average number of interacting partners in the human PPI network where both partners interact with <i>B. mallei</i> proteins (<i>D<sub>i</sub></i>); and the number of host proteins in the largest connected component (</p><p></p><p></p><p></p><p><mi>N</mi><mi>p</mi></p><p><mi>L</mi><mi>C</mi><mi>C</mi></p><p></p><p></p><p></p><p></p>). The top three rows show the results for the host proteins present in the PPI that interacted with the nine known virulence factors, whereas the three lower rows correspond to host proteins that interacted with all 21 tested <i>B. mallei</i> proteins from the yeast two-hybrid screening (known and putative virulence factors). The results for the randomly selected (498 or 619) human proteins from the entire human PPI network (All PPIs) were generated through 10<sup>3</sup> random repetitions to create averages and standard deviations. The indicated <i>p</i>-values correspond to the probability of the observed properties being different from the randomly selected set from all PPIs.<p></p
    corecore